Skip to main content

Internet Archive Web Crawls

The Internet Archive discovers and captures web pages through many different web crawls.



rss RSS

Show sorted alphabetically
Show sorted alphabetically
SHOW DETAILS
up-solid down-solid
Prior Page
eye
Title
Date Archived
Creator
Wide Crawl Number 12 - started March, 14th 2015
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Tue Apr 7 18:33:46 PDT 2015 to Tue Apr 7 15:33:03 PDT 2015.
Topic: crawldata
Survey Crawl Number 7
web
eye 1.2M
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl-hq10.us.archive.org:survey from Sat Feb 24 04:16:28 PST 2018 to Fri Feb 23 20:23:38 PST 2018.
Topic: crawldata
End Of Term 2016 UNT Crawls
End Of Term 2016 UNT Crawls
collection
1,275
ITEMS
9.2M
VIEWS
collection
eye 9.2M
End of Term 2016 Web Archive government web crawls by project partner the University of North Texas.
Topics: end of term, federal government, 2016, president, congress, university of north texas
Government Web & Data Archive
Government Web & Data Archive
collection
6,269
ITEMS
10.7M
VIEWS
collection
eye 10.7M
This collaborative project is an extension of the 2016  End of Term  project, intended to document the federal government's web presence by archiving government websites and data. As part of this preservation effort, URLs supplied from partner institutions, as well as nominated by the public, will be crawled regularly to provide an on-going view of federal agencies' web and social media presence. Key partners on this effort are the Environmental Data & Governance...
Topics: government, data, federal, congress
collection
eye 14.4M
End of term 2008 crawl data gathered by Internet Archive on behalf of the California Digital Library. This data is currently not publicly accessible.
Crawl Data
Crawl Data
collection
32,956
ITEMS
17.1M
VIEWS
collection
eye 17.1M
Crawl Data. This data is currently not publicly accessible.
crawl_UNK
crawl_UNK
collection
32,949
ITEMS
17.1M
VIEWS
collection
eye 17.1M
Crawl data. This data is currently not publicly accessible.
Live Web Proxy Crawls
web
eye 1.9M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 8.4M
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl344.us.archive.org:survey from Thu Oct 12 08:48:34 PDT 2017 to Thu Oct 12 01:56:31 PDT 2017.
Topic: crawldata
Wide Crawl Number 12 - started March, 14th 2015
web
eye 1.5M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl807.us.archive.org:wide from Mon Apr 13 02:19:20 PDT 2015 to Sun Apr 12 20:48:07 PDT 2015.
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.5M
favorite 0
comment 0
World Wars Crawl
World Wars Crawl
collection
13
ITEMS
11.7M
VIEWS
collection
eye 11.7M
Web data related to World Wars I and II collected by Internet Archive in an experimental crawl sponsored by National Endowment for the Humanities and JISC. This data is currently not publicly accessible.
Live Web Proxy Crawls
web
eye 2.6M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 975,920
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 94,630
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 7.3M
favorite 0
comment 0
Wide Crawl started October 2011
web
eye 598,579
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Fri Oct 7 04:29:15 PDT 2011 to Fri Oct 7 00:11:17 PDT 2011.
Topic: crawldata
YouTube 2007 Crawl
YouTube 2007 Crawl
collection
73,847
ITEMS
10.2M
VIEWS
collection
eye 10.2M
Data crawled from YouTube.com in 2007 by Internet Archive. These files are not currently accessible.
Wide Crawl started October 2011
web
eye 473,569
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Fri Oct 7 14:36:17 PDT 2011 to Fri Oct 7 08:44:43 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 17.3M
favorite 6
comment 1
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Sun Mar 27 22:10:09 PDT 2011 to Mon Mar 28 05:27:05 PDT 2011.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
2004 Election
2004 Election
collection
178
ITEMS
11M
VIEWS
collection
eye 11M
2004 Election crawl performed by Internet Archive. This data is currently not publicly accessible.
Shallow Crawl Started February 2013
Shallow Crawl Started February 2013
collection
86
ITEMS
15.3M
VIEWS
collection
eye 15.3M
Shallow crawl started February 2013 that collects content 1 level deep, including embeds. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Live Web Proxy Crawls
web
eye 3.1M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 1.2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Mon Feb 12 21:42:38 PST 2018 to Mon Feb 12 15:20:34 PST 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Tue Jun 13 00:55:34 PDT 2017 to Mon Jun 12 19:36:27 PDT 2017.
Topic: crawldata
Data crawled by National Endowment for the Humanities and JISC on behalf of Internet Archive from Fri Aug 08 00:17:40 PDT 2008 to Thu Jun 26 05:29:33 PDT 2008
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.6M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 181,109
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 456,361
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-liveweb.us.archive.org from 2012-05-18T17:38:26 UTC to 2012-05-19T02:47:39 UTC.
Topic: crawldata
End of Term 2008 UNT Dot Gov Crawl
End of Term 2008 UNT Dot Gov Crawl
collection
282
ITEMS
8.7M
VIEWS
collection
eye 8.7M
End of term 2008 crawl of .gov domains gathered by University of North Texas . This data is currently not publicly accessible. UNT is a student-focused, public, research university located in Denton, Texas.
Shallow Crawl Started November 2012
Shallow Crawl Started November 2012
collection
137
ITEMS
9M
VIEWS
collection
eye 9M
Shallow crawl started November 2012 that collects content 1 level deep, including embeds. This data is currently not publicly accessible.
Survey Crawl Number 7
web
eye 955,075
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl843.us.archive.org:survey from Wed Mar 7 07:44:08 PST 2018 to Wed Mar 7 04:40:01 PST 2018.
Topic: crawldata
Live Web Proxy Crawls
web
eye 407,754
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jul 19 07:27:42 PDT 2011 to Tue Jul 19 07:13:39 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 403,878
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 222,119
favorite 0
comment 0
Wide Crawl started February 2014
web
eye 6.7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Live Web Proxy Crawls
web
eye 205,752
favorite 0
comment 0
Wide Crawl started February 2014
web
eye 6.5M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 6.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Tue Feb 18 17:01:58 PST 2014 to Tue Feb 18 13:14:06 PST 2014.
Topic: crawldata
Hurricane Katrina
Hurricane Katrina
collection
112
ITEMS
8.4M
VIEWS
collection
eye 8.4M
Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
Live Web Proxy Crawls
web
eye 6.6M
favorite 0
comment 0
Wide Crawl started February 2014
web
eye 6.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl454.us.archive.org:wide from Wed Feb 19 05:20:19 PST 2014 to Wed Feb 19 01:54:33 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 6.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Wed Feb 19 09:49:01 PST 2014 to Wed Feb 19 06:07:15 PST 2014.
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.2M
favorite 0
comment 0
Wikipedia Outlinks July 2011
web
eye 138,374
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Fri Jul 22 10:06:29 PDT 2011 to Fri Jul 22 04:11:06 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 266,129
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 415,204
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jul 19 14:14:16 PDT 2011 to Tue Jul 19 12:45:40 PDT 2011.
Topic: crawldata
Wide Crawl started January 2012
web
eye 113,586
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl335.us.archive.org:wide from Wed Jan 18 07:35:48 PST 2012 to Wed Jan 18 00:23:43 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 93,313
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl335.us.archive.org:wide from Fri Jan 20 01:35:20 PST 2012 to Thu Jan 19 18:11:42 PST 2012.
Topic: crawldata
Survey Crawl Number 8
web
eye 203,317
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl818.us.archive.org:survey from Sat Jan 12 11:20:21 PST 2019 to Sat Jan 12 08:14:51 PST 2019.
Topic: crawldata
Survey Crawl Number 8
web
eye 199,374
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl339.us.archive.org:survey from Tue Jan 22 09:37:30 PST 2019 to Tue Jan 22 03:52:56 PST 2019.
Topic: crawldata
Survey Crawl Number 8
web
eye 197,673
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl344.us.archive.org:survey from Thu Jan 10 16:19:12 PST 2019 to Thu Jan 10 14:12:18 PST 2019.
Topic: crawldata
Survey Crawl Number 8
web
eye 161,009
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl835.us.archive.org:survey from Wed Jan 16 13:02:07 PST 2019 to Wed Jan 16 13:30:45 PST 2019.
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.3M
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl453.us.archive.org:survey from Mon May 26 22:33:15 PDT 2014 to Mon May 26 23:08:51 PDT 2014.
Topic: crawldata
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl419.us.archive.org:survey from Tue May 27 01:34:25 PDT 2014 to Mon May 26 22:52:57 PDT 2014.
Topic: crawldata
End Of Term 2016 Library of Congress Crawls
End Of Term 2016 Library of Congress Crawls
collection
3,892
ITEMS
4.6M
VIEWS
collection
eye 4.6M
End of Term 2016 Web Archive government web crawls by project partner the Library of Congress.
Topics: end of term, federal government, 2016, president, congress, library of congress, web, data, library...
Wide Crawl Started January 2013
web
eye 70,691
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl338.us.archive.org:wide from Fri Mar 29 16:07:59 PDT 2013 to Fri Mar 29 10:59:14 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Tue Feb 5 01:44:34 PST 2019 to Mon Feb 4 23:59:22 PST 2019.
Topic: crawldata
UNT Web
UNT Web
collection
35
ITEMS
5.2M
VIEWS
collection
eye 5.2M
This collection contains all collaborative crawl data contributed by University of North Texas (UNT).
Topics: UNT, web, texas, eot
Survey Crawl Number 8
web
eye 431,668
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl825.us.archive.org:survey from Tue Jan 15 00:32:09 PST 2019 to Tue Jan 15 02:46:53 PST 2019.
Topic: crawldata
Survey Crawl Number 8
web
eye 443,315
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl835.us.archive.org:survey from Tue Jan 29 00:07:09 PST 2019 to Mon Jan 28 17:38:10 PST 2019.
Topic: crawldata
End of Term 2016 Post-Inauguration Crawls
End of Term 2016 Post-Inauguration Crawls
collection
5,308
ITEMS
4.3M
VIEWS
collection
eye 4.3M
This collection contains web crawls performed as the post-inauguration crawl for part of the End of Term Web Archive, a collaborative project that aims to preserve the U.S. federal government web presence at each change of administration. Content includes publicly-accessible government websites hosted on .gov, .mil, and relevant non-.gov domains, as well as government social media materials. The web archiving was performed in the Winter of 2016  and Spring of 2017 to capture websites...
Topics: end of term, federal government, 2016, president, congress
Wide Crawl Number 12 - started March, 14th 2015
web
eye 105,395
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Mon Apr 6 23:02:44 PDT 2015 to Mon Apr 6 17:01:10 PDT 2015.
Topic: crawldata
Live Web Proxy Crawls
web
eye 624,369
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 6M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Mon Mar 28 12:43:47 PDT 2011 to Mon Mar 28 16:56:17 PDT 2011.
Topic: crawldata
Survey Crawl Number 8
web
eye 455,909
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl838.us.archive.org:survey from Fri Jan 18 06:42:06 PST 2019 to Fri Jan 18 00:46:03 PST 2019.
Topic: crawldata
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl838.us.archive.org:survey from Thu Feb 18 11:29:30 PST 2016 to Thu Feb 18 04:22:48 PST 2016.
Topic: crawldata
Top 150 Crawl
Top 150 Crawl
collection
30
ITEMS
5.3M
VIEWS
collection
eye 5.3M
Top 150 Alexa sites crawl performed by Internet Archive. This data is currently not publicly accessible.
Survey Crawl Number 8
web
eye 339,975
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl339.us.archive.org:survey from Sat Jan 12 18:55:42 PST 2019 to Sat Jan 12 19:13:13 PST 2019.
Topic: crawldata
Live Web Proxy Crawls
web
eye 242,740
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 770,526
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl836.us.archive.org:survey from Fri Feb 19 00:17:11 PST 2016 to Thu Feb 18 18:42:06 PST 2016.
Topic: crawldata